Overview of the Author Identification Task at PAN 2015

نویسندگان

  • Efstathios Stamatatos
  • Walter Daelemans
  • Ben Verhoeven
  • Patrick Juola
  • Aurelio López-López
  • Martin Potthast
  • Benno Stein
چکیده

This paper presents an overview of the author identification task at PAN-2015 evaluation lab. Similar to previous editions of PAN, this shared task focuses on the problem of author verification: given a set of documents by the same author and another document of unknown authorship, the task is to determine whether or not the known and unknown documents have the same author. However, in contrast to the setup of PAN-2013 and PAN-2014, as well as most previous work in this area, it is no longer assumed that all documents match in genre and topic. In other words, we study cross-topic and cross-genre author verification, a challenging, yet realistic, task. A new evaluation corpus was built, covering the four languages Dutch, English, Greek, and Spanish and comprising a variety of genres and topics. A total of 18 teams participated in this task. Following the practice of previous PAN editions, software submissions were required and evaluated within the evaluation-as-a-service platform TIRA. Based on TIRA, we were able to define challenging baseline models using submissions from the corresponding shared tasks at PAN-2013 and PAN-2014. Analytical evaluation results are given, including statistical significance tests. Moreover, we examine the performance of a heterogeneous ensemble that combines all participant models, and we present a comprehensive review of the submitted methods. Linda Cappellato and Nicola Ferro and Gareth Jones and Eric San Juan (eds.): CLEF 2015 Labs and Workshops, Notebook Papers, 8-11 September, Toulouse, France. CEUR Workshop Proceedings. ISSN 1613-0073, http://ceur-ws.org/Vol-1391/, 2015.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Overview of the PAN/CLEF 2015 Evaluation Lab

This paper presents an overview of the PAN/CLEF evaluation lab. During the last decade, PAN has been established as the main forum of text mining research focusing on the identification of personal traits of authors left behind in texts unintentionally. PAN 2015 comprises three tasks: plagiarism detection, author identification and author profiling studying important variations of these problem...

متن کامل

Overview of the 5th Author Profiling Task at PAN 2017: Gender and Language Variety Identification in Twitter

This overview presents the framework and the results of the Author Profiling task at PAN 2017. The objective of this year is to address gender and language variety identification. For this purpose a corpus from Twitter has been provided for four different languages: Arabic, English, Portuguese, and Spanish. Altogether, the approaches of 22 participants are evaluated.

متن کامل

Overview of the Author Identification Task at PAN 2013

The author identification task at PAN-2013 focuses on author verification where given a set of documents by a single author and a questioned document, the problem is to determine if the questioned document was written by that particular author or not. In this paper we present the evaluation setup, the performance measures, the new corpus we built for this task covering three languages and the e...

متن کامل

Overview of the Author Identification Task at PAN 2014

The author identification task at PAN-2014 focuses on author verification. Similar to PAN-2013 we are given a set of documents by the same author along with exactly one document of questioned authorship, and the task is to determine whether the known and the questioned documents are by the same author or not. In comparison to PAN-2013, a significantly larger corpus was built comprising hundreds...

متن کامل

Overview of the Author Identification Task at PAN-2017: Style Breach Detection and Author Clustering

Several authorship analysis tasks require the decomposition of a multiauthored text into its authorial components. In this regard two basic prerequisites need to be addressed: (1) style breach detection, i.e., the segmenting of a text into stylistically homogeneous parts, and (2) author clustering, i.e., the grouping of paragraph-length texts by authorship. In the current edition of PAN we focu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015